The

is always positive, because square of any number is always positive. But the

correlation coefficient can be positive or negative, depending on whether the fitted line slopes

upward or downward. If the fitted line slopes downward, make your r value negative.

Why did the program give you

instead of r in the first place? It’s because

is a useful estimate

called the coefficient of determination. It tells you what percent of the total variability in the Y

variable can be explained by the fitted line.

An

value of 1 means that the points lie exactly on the fitted line, with no scatter at all.

An

value of 0 means that your data points are all over the place, with no tendency at all for the X

and Y variables to be associated.

An

value of 0.3 (as in this example) means that 30 percent of the variance in the dependent

variable is explainable by the independent variable in this straight-line model.

Note:Figure 18-4 also lists the Adjusted R-squared at the bottom right. We talk about the adjusted

value in Chapter 17 when we explain multiple regression, so for now, you can just ignore it.

The F statistic

The last line of the sample output in Figure 17-4 presents the F statistic and associated p value (under

F-statistic). These estimates address this question: Is the straight-line model any good at all? In other

words, how much better is the straight-line model, which contains an intercept and a predictor

variable, at predicting the outcome compared to the null model?

The null model is a model that contains only a single parameter representing a constant term

with no predictor variables at all. In this case, the null model would only include the intercept.

Under α = 0.05, if the p value associated with the F statistic is less than 0.05, then adding the predictor

variable to the model makes it statistically significantly better at predicting SBP than the null model.

For this example, the p value of the F statistic is 0.013, which is statistically significant. It means using

weight as a predictor of SBP is statistically significantly better than just guessing that everyone in the

data set has the mean SBP (which is how the null model is compared).

Scientific fortune-telling with the prediction formula

As we describe in Chapter 15, one reason to do regression in biostatistics is to develop a prediction

formula that allows you to make an educated guess about value of a dependent variable if you know the

values of the independent variables. You are essentially developing a predictive model.

Some statistics programs show the actual equation of the best-fitting straight line. If yours

doesn’t, don’t worry. Just substitute the coefficients of the intercept and slope for a and b in the

straight-line equation:

.